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Abstract 

We  consider  identification  of  distributed  systems  via  adaptive  wavelet  neural  net¬ 
works  (AWNNs).  We  take  advantage  of  the  multiresolution  property  of  wavelet  systems 
and  the  computational  structure  of  neural  networks  to  approximate  the  unknown  plant 
successively.  A  systematic  approach  is  developed  in  this  paper  to  find  the  optimal 
discrete  orthonormal  wavelet  basis  with  compact  support  for  spanning  the  subspaces 
employed  for  system  identification.  We  then  apply  backpropagation  algorithm  to  train 
the  network  with  supervision  to  emulate  the  unknown  system.  This  work  is  applica¬ 
ble  to  signal  representation  and  compression  under  the  optimal  orthonormal  wavelet 
basis  in  addition  to  autoregressive  system  identification  and  modeling.  We  anticipate 
that  this  work  be  intuitive  for  practical  applications  in  the  areas  of  controls  and  signal 
processing. 
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1  Introduction 


There  are  two  well  known  types  of  system  identification  schemes,  parametric  and  non- 
parametric.  The  former  depends  on  the  given  model  structure  used  in  identification  and 
determines  the  model’s  parameters  based  on  input  and  output  of  the  unknown  systems. 
The  second  scheme  does  not  require  the  information  regarding  to  the  model  structure  and 
gives  an  estimate  of  the  impulse  response  of  the  unknown  systems.  However,  some  cases  are 
not  suitable  to  be  treated  with  these  conventional  approaches  due  to  insufficient  analytical 
knowledge  of  the  plant,  incomplete  information  on  the  number  of  key  parameters  and  the 
presence  of  disturbance  and  uncertainties.  Even  when  enough  knowledge  about  the  system 
is  available,  the  model  of  the  system  may  be  too  complicated  to  be  used  to  design  control 
systems. 

We  are  interested  in  introducing  another  form  of  identification  scheme  which  employs  a 
parallel  computational  structure  and  uses  knowledge  from  measurement  to  adapt  to  different 
models  and  structures.  This  method  can  be  used  for  both  linear  and  nonlinear  system 
identification.  The  underlying  idea  is  two-fold:  first,  identify  the  type  or  class  of  the  system 
and  pick  a  simple  component  or  a  structure  which  describes  the  characters  of  the  system; 
second,  start  from  the  simple  structure  to  build  a  basis  to  generate  or  approximate  the 
given  systems  successively  in  an  appropriate  functional  space. 

We  have  found  recent  advancement  in  wavelet  theory  encouraging  in  generating  an 
autoregressive  modeling  structure  for  system  identification  and  signal  approximation  in 
L2(R).  There  have  been  extensive  research  interest  and  activities  in  wavelet  theory  and 
its  applications  in  recent  years  [4]  [7].  The  most  attractive  features  of  wavelet  theory 
are  the  multiresolution  property  and  time  and  frequency  localization  ability.  The  wavelet 
transform  decomposes  a  signal  to  its  components  at  different  resolutions.  Its  application 
actually  simplifies  the  description  of  signals  and  provides  analysis  at  different  levels  of  detail. 
There  are  some  successful  applications  of  these  properties  in  the  fields  of  signal  processing, 
speech  processing  and  especially  in  image  processing  [16]  [12].  It  was  shown  [13]  that  it  is 
possible  to  derive  a  base  wavelet  function  ip(x)  G  L~(R)  such  that  for  j,  l  £  Z,  {ipjj(x)}jjez 
with 

ipjtl(x)  =  Vtiip^x  -  l )  (1) 

is  an  orthonormal  basis  of  L2(R).  Any  square  integrable  function  f(x)  £  L(R2)  can  be 
represented  as 

f(x)  =  J2whl^jAx)->  (2) 

3,1 

the  coefficients  w'j  ;s  carry  the  information  of  f(x)  near  frequency  V  and  near  x  —  2~H. 
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Any  signal  in  L2(R)  can  be  decomposed  to  its  components  in  different  scales  in  subspaces 
of  L2(R)  of  corresponding  resolutions  and  the  reverse  is  true  when  the  regularity  condition 
for  the  base  wavelet  ip{x)  is  introduced  [7]  [13].  The  base  function  'ip(x)  plays  a  central  role 
in  this  formulation. 

We  consider  identification  as  constructing  a  suitable  subspace  of  L2(R )  and  generating  a 
function  to  approximate  the  output  of  the  system  with  respect  to  the  input  since  a  large  class 
of  transfer  functions  of  flexible  structure  systems  and  distributed  systems  belong  to  L2(R). 
The  identification  of  a  transfer  function  or  a  input  output  relation  can  thus  be  formulated 
as  the  approximation  of  a  function  in  L2(R)  by  its  projection  on  an  appropriate  subspace  of 
L2(R).  If  we  can  construct  a  suitable  subspace  of  L2(R )  in  an  appropriate  scale  spanned  by 
dilating  and  shifting  a  base  wavelet  function,  we  should  be  able  to  approximate  a  function  in 
L2{R)  with  a  function  in  the  subspace  of  the  relevant  resolution  in  the  sense  of  minimizing 
a  norm  of  the  difference  between  the  two  functions.  Naturally,  the  best  approximation 
is  predetermined  by  the  subspace  in  consideration  and  thus  by  the  base  wavelet  which 
determines  the  dynamical  characteristics  of  the  subspace  used  for  approximation.  If  partial 
information  of  the  system  is  available  a  priori ,  or  the  class  of  the  function  to  be  approximated 
is  detected,  an  appropriate  wavelet  basis  could  be  built  and  the  multiresolution  property 
can  be  used  to  approximate  the  function  regressively. 

When  a  function  in  L2(R)  or  a  transfer  function  in  H2(R)  is  unknown,  the  wavelet 
system  is  feasible  for  its  identification.  Some  work  relating  wavelets  to  linear  systems  can 
be  found  in  [15].  Since  a  closed  expression  is  usually  not  available  for  practical  purposes, 
it  is  necessary  to  use  a  sum  of  finite  number  of  functions,  typically  of  lower  order  or  less 
complexity,  to  approximate  the  original  transfer  function.  A  wavelet  system  can  be  im¬ 
plemented  to  emulate  the  unknown  system.  This  process  is  completed  by  adjusting  the 
coefficients  with  respect  to  the  wavelet  basis. 

The  transfer  function  of  an  infinite  dimensional  system  or  a  distributed  system  is  usually 
a  sum  of  infinitely  many  functions  of  certain  classes.  Under  certain  conditions,  a  distributed 
system  with  a  transfer  function  G(x,^,s)  can  be  represented  by  infinite  many  parallel 
aperiodic  distributed  blocks  and  oscillatory  blocks[3], 

OO 

G(x,£,s)  =  ^2Gi(s)pi(x)qi(t)  (3) 

2  =  1 

where  p  and  q  are  the  eigenfunctions  of  the  corresponding  boundary  value  problems.  The 
Green’s  function  has  a  similar  structure  which  is  the  system’s  impulse  response.  We  shall 
use  G(s)  to  denote  the  above  transfer  function  for  clarity  in  notation.  When  we  mention 
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a  transfer  function,  we  refer  either  a  concrete  transfer  function  or  an  implied  input  output 
relation  in  the  rest  of  the  paper.  The  summation  form  of  both  transfer  functions  and 
the  Green’s  function  can  be  arranged  in  a  tree-like  structure;  the  sum  of  the  weighted 
functions  can  be  laid  out  as  a  summation  of  the  weighted  subsums  of  similar  structures. 
The  weighted  sum  reminds  us  a  computational  structure:  neural  networks.  We  arrange 
the  wavelet  system  in  a  similar  fashion  such  that  coefficients  of  the  systems  turn  into  the 
synapses  of  the  neural  networks.  The  process  of  approximation  becomes  training  of  the 
neural  network.  Techniques  from  neural  networks  are  applicable. 

A  general  representation  of  a  neural  network  is  a  computational  structure  of  finite  linear 
combinations  of  the  form 

N 

iKx)  =  £  Wja(ajxL  +  bj),  (4) 

i=l 

where  x,  slj  €  RN ,  bj  G  R  are  fixed.  The  network  is  formed  from  weighted  compositions  and 
superpositions  of  a  single,  simple  nonlinear  pattern  or  response  function.  The  univariate 
function  a  depends  heavily  on  the  context  of  the  application.  Neural  networks  have  found 
their  applications  in  controls  and  system  identification.  A  neural  network  was  used  as  an 
emulator  and  controller  to  control  a  highly  nonlinear  truck-trailer  docking  problem  in  [14], 
Some  applications  of  neural  networks  have  been  studied  and  summarized  in  [8]  regarding 
modeling,  identification  and  control  structures.  The  nonlinear  functional  mapping  proper¬ 
ties  of  neural  networks  are  central  to  their  applications  in  system  identification  and  controls. 
It  has  been  proven  [6]  that  a  two-layer  neural  network  can  approximate  a  nonlinear  func¬ 
tion  to  an  arbitrary  degree  of  accuracy.  However,  the  number  of  neurons  required  in  the 
networks  may  far  exceed  the  limit  for  practical  implementations.  This  poses  a  burden  for 
the  applications  in  on  line  system  identification  and  real  time  system  controls.  An  issue 
in  control  is  the  dynamical  nature  of  the  system.  When  proper  dynamics  are  included  in 
the  neural  networks,  the  performance  of  the  networks  is  expected  to  be  improved.  With 
wavelet  dilations  incorporated  into  the  network,  the  signals  to  the  neurons  are  preprocessed 
by  the  wavelet  blocks.  We  anticipate  that  the  information  from  the  wavelet  basis  will  reduce 
the  number  of  neurons  needed  to  achieve  the  same  performance  provided  that  the  wavelets 
contain  useful  information  of  the  systems  in  consideration. 

Our  thoughts  on  a  unified  work  on  wavelets  and  neural  networks  are  further  encouraged 
by  the  work  in  [20].  We  are  interested  in  a  new  formulation  using  both  the  multiresolution 
property  from  wavelet  decomposition  and  the  convenience  of  computational  structures  of 
neural  networks  to  approximate  the  unknown  plants;  We  introduce  in  this  paper  a  self¬ 
tuning  wavelet  neural  network  which  adjusts  its  wavelet  basis  according  to  measurements. 
We  call  it  an  adaptive  wavelet  neural  network  (AWNN). 
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Figure  1:  A  wavelet  neural  network  identification  structure 


This  paper  is  organized  as  follows.  The  next  section  formulates  the  identification  prob¬ 
lem  via  adaptive  wavelet  neural  networks.  The  third  section  provides  details  on  the  selection 
of  the  optimal  wavelet  base  function  for  the  wavelet  neural  networks.  The  fourth  section 
addresses  the  network  training  and  discusses  a  learning  algorithm.  The  last  section  suggests 
future  research  and  concludes  the  paper. 


2  Problem  statement 


Given  an  infinite  dimensional  stable  system  with  unknown  transfer  function  G(s),  we  set 
up  an  identification  structure  shown  in  Figure  1,  in  which  u(s)  and  y(s)  are  the  input 
and  output  to  and  from  the  unknown  system.  An  adaptive  wavelet  neural  network  block 
(AWNN)  is  used  to  emulate  the  given  system  with  z(s )  as  its  output.  The  matching  error 
e(s)  is  defined  as  the  difference  between  y(s)  and  z(s).  The  network  is  tuned  to  match  the 
system  through  minimizing  the  error  e(s ). 

The  structure  of  an  adaptive  neural  wavelet  network  is  shown  in  Figure  2  ,  in  which 
u(s)  is  the  input  to  both  the  system  and  the  network,  z(s)  is  the  corresponding  output. 
This  network  contains  a  hidden  layer  of  an  appropriate  wavelet  basis  {4’j,l}  from  dilating 
and  shifting  a  base  wavelet  ip(s)  which  is  to  be  determined  via  an  optimal  adaptive  scheme 
of  basis  selection.  The  activate  function  ct(-)  is  a  nonlinear  function.  One  of  the  possible 
forms  is  a  sigmoidal  function 


a{x)  - 


1  —  e  2x 
1  +  e~2x 


(5) 
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Figure  2:  An  AWNN  block 


Although  this  is  a  general  setting,  the  dynamics  of  the  activation  function  can  be  selected 
either  as  a  linear  or  a  nonlinear  function  according  to  the  dynamics  of  the  wavelet  blocks. 
The  output  of  the  network  is  given  by 

z(s)  =  o  C%2wjtiipjj(s))u(s),  (6) 


where 

G(s)  =  (7) 

jV 

is  the  estimated  transfer  function  of  the  unknow  system.  The  function  G(s)  approximates 
the  transfer  function  to  a  certain  level  of  resolution  which  depends  on  the  resolution  of  the 
subspaces  spanned  by  the  wavelet  functions. 

The  base  wavelet  function  ip(x)  determines  the  dynamical  nature  of  the  adaptive  wavelet 
neural  networks.  How  to  choose  the  right  wavelet  function  ip(x)  is  an  important  and  non¬ 
trivial  issue  which  has  drawn  recent  attentions  from  the  signal  processing  communities  [18]. 
Different  base  wavelet  functions  shall  generate  different  subspaces  used  to  approximate 
transfer  functions  in  L2(R)  and  produce  different  results.  We  are  interested  in  finding  the 
wavelet  function  which  describes  the  dynamical  behavior  of  the  systems  in  consideration 
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most  closely.  When  incorporated  into  the  network,  the  wavelet  network  should  have  the 
best  performance  for  a  certain  complexity  or  to  provide  a  certain  performance  level  with  a 
minimum  complexity.  This  is  true  when  ij>(x)  is  chosen  to  contain  information  regarding 
the  class  on  the  given  systems.  We  say  a  base  wavelet  is  optimal  if  a  nonnegative  additive 
information  measure  M.  [5]  which  describes  the  distance  between  a  finite  length  signal  and 
the  wavelet  basis  generated  by  ip(x)  is  minimized.  The  information  measure  is  a  functional 
which  is  defined  by 

M  :  L2(R)  x  L2(R)  i — ♦  R+.  (8) 

We  shall  use  this  optimal  wavelet  function  tf>(x)  in  our  wavelet  neural  networks  for  system 
approximation.  The  selection  of  the  optimal  base  wavelet  shall  be  discussed  in  detail  in  the 
section  that  follows. 

We  define  the  random  error  at  instant  k  by  the  random  sample  (uk,yk)  as  the  difference 
between  yk  and  Zk,  with  the  system  output  yk  as  the  desired  output  for  the  neural  network. 
The  error  at  the  kth  is  defined  by 

ek  =  Vk~  zk.  (9) 

The  square  of  error  at  step  k  is 

Ek  =  \[yk-zk]2.  (10) 

The  accumulated  error  E, 

E  =  Y,Ek  (U) 

k 

sums  the  errors  of  the  first  k  iterative  steps.  The  network  with  a  minimal  matching  error 
E  is  required  to  emulate  the  unknow  system.  The  identification  problem  transforms  into 
trajectory  learning  in  discrete  time  domain. 

Our  problem  becomes  two  folds:  selecting  the  best  wavelet  basis  for  a  wavelet  neural 
network;  training  the  AWNN  afterwards  to  match  the  unknown  plant.  First,  we  need  to 
find  the  optimal  base  wavelet  function  ip*(x )  such  that  the  positive  cost  measure  M.  is 
minimized  for  the  detected  dynamical  behavior  of  a  given  system,  i.e., 

ip*(x)  =  arg  min f(x)).  (12) 

Secondly,  we  need  to  train  the  network  to  emulate  the  given  system  in  the  sense  of  finding 
the  optimal  weights  {wjj}  to  minimize  the  cost  index  J  which  is 

Jopt— min  E[w],  (13) 
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The  input-output  relation  of  the  trained  neural  wavelet  network  is  used  to  represent  the 
transfer  function  of  the  given  system  to  facilitate  the  design  of  control  systems.  This  forms 
a  self-tuning  system  identification  scheme  via  an  AWNN. 

3  Selection  of  base  wavelet  functions 

We  shall  study  the  problem  of  choosing  the  optimal  wavelet  basis  with  compact  support  of 
an  appropriate  size  in  this  section.  We  first  briefly  review  the  multiresolution  property  of 
wavelet  functions  and  the  conditions  for  generating  a  set  of  compactly  supported  discrete 
wavelet  basis  in  terms  of  properties  of  quadrature  mirror  filter  (QMF)  banks  [19].  We 
then  introduce  the  concepts  of  information  measure  as  a  distance  measure  and  the  optimal 
discrete  orthonormal  wavelet  basis  under  the  information  measure.  A  systematic  approach 
is  being  developed  here  to  derive  the  information  gradient  and  the  best  wavelet  basis.  This 
approach  can  be  implemented  for  real  time  systems  due  to  our  parameterization  of  the 
problem. 

A  multiresolution  approximation  due  to  [13]  of  L2(R)  is  a  sequence  {Vj}j^z  of  closed 
subspaces  of  L2(R)  such  that  the  following  hold  with  Z  denoting  the  set  of  all  integers, 

Vj  C  Vi+U  Vi  6  Z  (14) 

+oo  +oo 

(J  Vj  is  dense  in  L2(R)  and  P|  Vj  =  {0}  (15) 

j  —  —  OO  j=z  —  OO 

f(x)  G  Vj  <=»  /( 2x)  G  Vj+h  Vi  G  Z  (16) 

f(x)  G  Vj  ==>  f(x  -  2~’k)  G  Vj,  k(=Z  (17) 

and  there  is  a  scaling  function  <f>(x)  G  L2(R),  such  that,  for  all  j  G  Z, 

^(2^-/))/eZ  (18) 

is  a  orthonormal  basis  of  Vj  with  Vj  C  Vj+ 1.  With  this  setting,  Hj,  the  complement  of 
Vj  C  Vj+i,  can  be  expressed  as 

Vj  ©  Hj  =  Vj+1,  (19) 

with 

Vj  =  tfll^Hj.  (20) 

For  all  i,  there  is  a  wavelet  function  ip{x),  such  that, 

Vtiip(2j x  -  l))tez  (21) 
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is  an  orthonormal  basis  of  Hj .  The  additional  information  in  an  approximation  at  resolution 
2J+1  compared  with  the  resolution  2J  is  contained  in  the  subspace  Hj,  the  orthogonal 
complement  of  Vj  G  Vj+ If  we  define  Py.  to  be  a  projection  operator  in  L2(R)  and  I  to 
be  the  identity  operator,  then 


PV]  — *  I ,  as  j  — >  +oo.  (22) 

A  particular  useful  setup  for  our  problem  is  a  set  of  discrete  orthonormal  wavelet  basis 
with  compact  support.  It  is  useful  for  real  time  implementation  on  digital  computers.  The 
compactness  of  support  provides  a  means  of  isolation  and  detection  of  signals  at  a  certain 
region  which  has  proven  useful  in  signal  processing  communities.  Both  the  discrete  scaling 
function  4>(x)  and  the  discrete  wavelet  function  ip  can  be  parameterized  by  a  set  {c*,}  with 
ks  belonging  to  a  set  of  integers. 

The  scaling  function  <p(t),  with  t  denoting  discrete  time,  compactly  supported  on  [0,  K  — 
1],  can  be  expressed  as  [7] 

0(*)  =  EC^(  2t~k)-  (23) 

k 

The  discrete  wavelet  is  given  by 

1>(t)  =  £4<M2*  ~  (24) 

k 

where 

Cfc/0,  ke  [0,AT  — 1].  (25) 

These  are  the  two  fundamental  equations  for  wavelet  function  ip(t).  The  scaling  function 
< p(t )  can  be  nonzero  only  on  [0,  K  —  1]  due  to  the  finite  duration  of  the  sequence  {q,}.  The 
base  wavelet  function  obtained  through  (p(t)  is  also  compactly  supported.  The  coefficients 
{cfc}  and  {dfc}  can  be  identified  as  a  low  pass  filter  and  a  high  pass  filter  respectively.  Let 
us  denote  ho(k)  =  q./2  and  h\(k)  =  dfc/2  and  take  their  Fourier  transforms 

W")  =  £MA (26) 

k 

and 

H1(en  =  '£hi(k)e-ju.  (27) 

k 
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The  conditions  for  compactly  supported  orthonormal  wavelet  and  scaling  functions  are 
equivalent  to  that  the  matrix 


H0(e^)  tfi(e^) 
H0(e 


(28) 


is  unitary  for  all  u>  for  the  two-channel  quadrature  mirror  filter  (QMF)  bank  [1].  This  is  the 
constraint  that  the  parameters  c*,  should  satisfy.  In  particular,  the  cross-filter  orthonormal¬ 
ity  implied  by  the  unitary  property,  is  satisfied  by  the  choice  [1] 

Hi(z)  =  zk~1Hq(-z-1),  N  even  (29) 

or  in  the  time  domain, 

hl(k)  =  (-l)k+1h0(K-l-k),  (30) 

and  in  addition 

Jip(t)dt  —  0.  (31) 

As  we  can  see  from  the  above,  both  the  scaling  function  and  the  wavelet  function  depend 
on  the  choice  of  { Cf- }  for  k  G  [0,  K  —  1],  The  base  wavelet  function  depends  on  the  selection 
of  this  set  of  parameters. 

The  key  to  choosing  the  optimal  wavelet  base  for  the  AWNN  lies  in  the  appropriate 
parameterization  and  the  right  performance  measure  in  addition  to  the  accurate  interpre¬ 
tation  of  physical  phenomena.  A  method  is  proposed  in  [18]  [10]  for  choosing  a  wavelet 
for  signal  representation  based  on  minimizing  an  upper  bound  of  the  L2  norm  of  error  in 
approximating  the  signal  up  to  the  desired  scale.  Coifman  et  al.  derived  an  entropy  based 
algorithm  for  selecting  the  best  basis  from  a  library  of  wavelet  packets  [5] .  However,  a  direct 
method  to  systematically  generate  the  best  orthonormal  discrete  wavelet  basis  with  com¬ 
pact  support  is  still  to  be  developed.  We  shall  provide  here  a  direct  approach  to  calculate 
the  best  discrete  wavelet  basis. 

We  first  introduce  a  distance  measure  for  optimization  purpose.  Inspired  by  the  work 
in  [5],  we  define  an  additive  information  measure  of  entropy  type  and  the  optimal  basis  as 
the  following. 

Definition  3.1  A  non  negative  map  A4  from  a  sequence  { f{ }  to  R  is  called  an  additive 
information  measure  if  M.{ 0)  =  0  and  fi)  = 

Definition  3.2  Let  x  G  Rn  be  a  fixed  vector  and  B  denote  the  collection  of  all  orthonormal 
bases  of  dimension  N ,  a  basis  B  G  B  is  said  to  be  optimal  if  Ai(Bx)  is  minimal  for  all 
bases  in  B  with  respect  to  the  vector  f . 
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We  shall  define  a  distance  measure  between  a  signal  and  its  decompositions  to  subspaces 
of  L2(R)  motivated  by  Shannon  entropy  (Shannon’s  formula)  [9] 

H(X)  =  H(P)  =  -  £  P(*)logP(*),  (32) 

x&X 

which  is  interpreted  as  a  measure  of  the  information  content  of  a  random  variable  X  with 
distribution  Px  =  P  in  information  theory. 

Definition  3.3  Let  H  be  a  Hilbert  space  which  is  an  orthogonal  direct  sum 

H  =  (33) 


a  map  £  is  called  decomposition  entropy  if 


for  v  £  H ,  |M|  7^  0,  such  that 


and  we  set 


'  ^IHI2  |[i>ll2 

V  =  ©  ^2  Vi,  Vi  €  Hi, 
plogp  —  0,  when  p  —  0. 


(34) 


(35) 

(36) 


Entropy  is  a  good  measure  for  signal  concentration  in  signal  precessing  and  information 
theory.  The  value  of  exp£ (v)  is  proportional  to  the  number  of  coefficients  and  the  length  of 
code  words  necessary  to  represent  the  signal  to  a  fixed  mean  error  and  to  error  less  coding 


respectively.  The  number  ^j|L  is  the  equivalent  probability  measure  in  the  decomposition 


entropy.  In  our  system  identification  formulation,  energy  concentration  is  identified  with 
model  of  lower  order  or  networks  with  less  complexity. 

Let  ip(t)  be  the  base  wavelet  function  and  let  'P  (t)  represent  the  orthonormal  discrete 
wavelet  basis  of  L 2  generated  by  dilation  and  shifting  of  ip(t),  similarly,  we  define  to  be 
the  basis  of  Hj.  We  write  'k(t)  =  {‘f’jjft)}  and  'f'j(Z)  =  {ifj,i{t)}iez  respectively.  We  treat 
both  'k(t)  and  tyj  as  operators  and  thus  define  the  following. 


Definition  3.4  Let  $  be  a  basis  given  above,  a  base  operation  is  defined  to  be  a  map  from 
L2(R)  to  a  set  of  real  numbers,  i.e.,  1$r(t)/(t)  =  {fj,i}j,iez  where  fjj  =  {f(t),ifjj(t))  for  all 
fit)  G  L2. 
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Figure  3:  Mesh  structure  of  the  projection  space 

Consider  Vj,  the  subspace  of  L2(R),  with 

Vj  =  (37) 

Equation  (7)  and  Equation  (2),  let  M  and  N  be  appropriate  positive  integers,  we  truncate 
the  approximation  in  Equation  (2)  to  a  scale  up  to  M,  we  have 

M  N 

/(*)  =  E  E  wirtiAx)-  (38) 

j=-Ml=-N 

The  subspaces  used  to  approximate  function  f(x)  has  a  mesh  of  size  (2 M  +  1)  x  (2 N  +  1) 
as  in  Figure  3. 

Given  a  function  or  signal  f(t)  €  LZ(R)  and  a  base  wavelet  function  '(f) {t)  with  a  finite 

mesh  of  size  (2 M  + 1)  x  ( 2N  +  1),  we  can  decompose  the  signal  to  the  orthogonal  subspaces 
as 

M  N 

f(t)=  E  E  MM-  (39) 

j=—M  l=—N 

We  axe  going  to  find  the  best  wavelet  base  function  ip(t)  for  a  given  signal  f(t)  such  that 
the  additive  information  measure  M  is  minimized.  The  result  of  the  base  operation  ^f(t) 
appears  as  the  weights  on  the  nodes  of  the  mesh.  The  weights  on  the  vertical  line  with 
coordinate  j  is  the  number  set  produced  by  xpjf(t) 
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Although  the  decomposition  entropy  is  a  good  measure  for  the  “distance”,  it  is  not  an 
additive  type  of  map  because  of  the  norm  ]|u|j  is  used  to  scale  the  vector.  We  thus  further 
introduce  a  functional 

M^)  =  -EW2i°glNI2»  (40) 

i 

which  relates  to  the  decomposition  entropy  through 

£(v,  {Hi})  =  |M|-2  A(*«)  +  £  log  ||n||2  .  (41) 

i 

The  former  function  in  (40)  is  an  additive  measure.  Since  minimizing  the  later  minimizes 
the  later,  we  minimize  functional  A ($/)  for  seeking  the  optimal  wavelet  basis  through 
multiresolution  decompositions. 

The  weight  of  decomposition  of  signal  f(t)  on  a  subspace  Hj  is  measure  by  a  subnorm 
||/j||  defined  as 

ll/j(‘)ll  =  ||fy[/«l|,  02) 

where 

ll/ill2  =  £  fh-  (43) 

l=—N 

Similarly,  the  norm  of  the  decomposed  signal  is  given  by 

M 

ii/wn2  =  E  ii/iii2.  (44) 

j=  —  M 

o  r 

We  need  to  further  find  which  is  a  measure  of  sensitivity  of  the  component  of  signal 

decomposition  to  a  wavelet  basis  versus  the  change  of  the  defining  parameter  set  of  the  base 
wavelet.  One  can  solve  this  through  numerical  methods  from  the  relations  and  definitions. 
Based  on  the  definition  of  information  gradient  and  the  properties  of  QMF  discussed  earlier, 
we  derive  an  explicit  expression  as  following. 

Lemma  3.1  The  sensitivity  gradient  of  component  tJijj  of  the  wavelet  basis  if  versus 
parameter  c/t  is  given  by 

^  =  V2 —  21  —  n)  +  (-l)"+1c^_1_„ -2 1  -  n)  . 
ock  „  dck 

(45) 
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Proof: 

From  the  fundamental  equation  of  wavelets  (24), 


(46) 


This  is 


dj>il 

dck 


=  2\/2^5Z 


— -  ■■0(2J+1t  —  2 1  —  n)  +  /ii(n)-^-</>(2-?+1t  —  21  —  n) 
dck  ack 


(47) 


From  Equation  (23),  we  have 


dck 


4>(2t  -  k). 


hence. 


8 

dck 


<f>(2 j+1t  -2 1 -n)  =  <t>{2 j+2t  -  M  -  2n  -  Ic). 


(48) 


(49) 


We  need  to  find  ,  from  the  time  domain  relation  (30)  of  the  QMF,  we  have  , 


hi  (n)  =  (-1  )n+1h0(K  -  1  -n)  (50) 

with  ho  being  compactly  supported  on  [0,  K  —  lj.  Thus, 

hi(n)  =  ~(-l  )n+1cK-i-n,  (51) 

there  is  only  one  nonzero  term  when  K  —  1  —  n  =  k.  This  yields, 

dhAn)  _  (~l)A'-fc 

dck  2  •  (5"} 


The  lemma  is  proven  through  (49)  and  (52). 


□ 


This  lemma  establishes  a  direct  link  between  the  rate  of  change  of  the  components  in  the 
basis  and  the  variations  of  the  parameters  in  the  fundamental  equations  of  wavelets,  which 
leds  to  the  next  theorem.  We  introduce  the  following  theorem  to  show  the  relationship 
between  the  information  measure  and  the  parameter  set  c ^  and  the  relation  here  shall 
provide  a  clue  for  developing  an  algorithm  to  find  the  optimal  base  wavelet  function  for  the 
AWNN. 
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Theorem  3.1  Let  A(-)  be  the  additive  information  measure  and  [0,  if  —  1]  be  the  compact 
support  for  {cjt}  and be  the  corresponding  wavelet  basis,  let  f(t)  be  a  fixed  signal  in  L2  ( R) , 
then  the  gradient  of  the  information  measure  with  respect  to  the  set  {ck}  for  the  signal  is 
given  by 


d\ (*/(*)) 

dck 


-2  EE lo& 2  ll/ill2  •  E  [(-l)^fc  (/(*),  <t>(2j+1t  -2 l-n)) 

j  l  n 

+(-l )n+1CK-l-n  ( /(< ),  <K2j+2t  -4 1-  2 n  -  fc))]  .  (53) 


Proof: 

By  the  chain  rule,  we  have  the  information  gradient 


d\(Vf(t))  ^^dXjVfWdWfjf 
dck  y  dWfjf  dck 

The  definition  of  information  measure  A(/(f))  in  (40)  yields, 


3A(*/(*)) 
^11 /ill 2 


=  -log  ||/i||2 -1 


=  —  log  2  ||/il|2  , 


with  2  being  the  base  of  log  function.  We  use  the  chain  rule  again, 


ail /ill2 

dck 


2E/w 


dffl 

dck  ' 


We  have  so  far 


Ml 

dck 


-2  x:  e^2  ii/iii2 /,v 

i  / 


Wil 

dck  ' 


Since 


the  result  from  the  previous  lemma  concludes  the  proof. 


(54) 


(55) 


(56) 


(57) 


(58) 


□ 
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The  theorem  demonstrates  an  explicit  relation  among  the  gradient  of  the  additive  informa¬ 
tion  measure,  parameter  set  {c*,}  and  the  measured  signal  /(f).  It  will  facilitate  the  search 
for  the  optimal  wavelet  basis  due  to  our  parameterization  and  the  information  measure. 

We  have  identified  the  problem  of  finding  the  optimal  wavelet  basis  $  with  that  of 
finding  a  parameter  set  { c;. }  such  that  the  additive  information  measure  A  is  minimized. 
Once  the  set  {c/, }  is  determined,  both  the  scaling  function  <p  and  the  base  wavelet  function 
ip  can  be  derived  afterwards.  Equipped  with  the  above  theorem,  the  information  gradient 
is  available,  different  optimization  schemes  can  be  applied  to  solve  this  problem.  We  have 
developed  a  basis  choosing  algorithm  based  on  a  steepest  descent  method  as  follows.  To 
simplify  notation,  we  denote  the  parameter  set  {coci  •  •  •  c*-_/}  by  a  vector  C. 

Algorithm  1  Computation  of  the  optimal  wavelet  basis 

Step  1:  Set  i  :—l, 

Ao  :=  0, 

mesh  parameters  M,  N  ; 

Initialize  vector  Co; 

Input  /(t). 

Step  2:  If  Ci  dose  not  satisfy  the  constraint, 

then,  modify  Ci  and  repeat  Step  2. 

Step  3:  Ci  :=  C,_i  +p,-_  1^7- 
Step  4:  Compute  (p  and  ip. 

Step  5:  Compute  A. 

Step  6:  If  | A,  -  A;_i[  >  e, 

i  i  +  1,  go  to  Step  2. 

Step  7:  Output  the  optimal  basis  \fr  and  stop. 

The  mesh  size  is  governed  by  the  choice  of  parameter  M  and  N .  Obviously,  when  M  and 
N  turn  to  infinity,  the  supporting  subspace  spanned  by  the  dilations  and  shifts  of  the  base 
wavelet  turns  to  space  L2(R).  The  size  of  the  mesh  is  identified  with  the  complexity  of  the 
resulted  AWNN.  The  constraint  on  the  parameter  c*.  is  dominated  by  the  unitary  property 
of  the  QMF  bank  which  can  be  transformed  into  an  algebraic  equation. 

This  section  has  provided  us  a  direct  method  to  construct  an  optimal  orthonormal 
wavelet  basis  with  compact  support.  The  parameterization  of  both  the  information  measure 
and  the  base  wavelet  allows  an  explicit  expression  of  information  gradient  with  respect  to  the 
optimization  parameters  and  thus  paves  the  way  to  the  efficient  basis  choosing  algorithm. 
This  methodology  of  the  optimal  basis  selection  in  a  general  setting  is  useful  not  only  within 
this  identification  structure  but  also  to  signal  approximation  and  reconstruction  in  L\R). 
The  parametrization  of  cost  functionals  is  not  unique,  other  forms  of  measures  or  cost 
functions  may  be  introduced  according  to  the  contexts  of  the  actual  physical  problems. 
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Figure  4:  AWNN  training  structure 


4  Network  training 

This  section  describes  a  supervised  learning  process  of  the  AWNN.  The  training  of  an 
AWNN  consists  two  stages,  a  pre-training  procedure  and  an  actual  training  scheme  via 
weight  updating.  The  pre-training  is  a  preparation  process  of  configuration  or  adjusting 
the  basis  of  the  network  based  upon  the  output  measurements  from  the  unknown  systems 
excited  by  a  test  signal.  The  purpose  is  to  equip  the  network  with  the  appropriate  dynamics 
and  generate  an  AWNN  of  a  manageable  size.  The  network  is  trained  afterwards  with  a 
supervised  learning  process.  The  training  structure  is  shown  in  Figure  4. 

During  the  first  stage,  the  network  takes  the  output  of  the  unknow  systems  excited  by  a 
test  signal  and  looks  for  the  best  wavelet  basis  with  the  switch  at  the  closed  position.  The 
algorithm  given  in  the  previous  section  is  used  here  to  generate  the  best  wavelet  basis  T  for 
the  AWNN.  The  dynamical  behavior  of  the  AWNN  is  thus  determined  by  this  process.  This 
stage  also  provides  appropriate  initial  weights  for  the  network  training  to  start  with.  Since 
the  basis  contains  the  measured  information  of  the  unknown  system,  the  required  size  of  the 
network  is  reduced  compared  with  a  neural  network  without  the  dynamical  components. 
This  will  speed  up  the  network  training  process. 

The  next  stage  is  the  network  training  which  is  a  goal-directed  learning  aimed  at  mini¬ 
mizing  the  relevant  cost  functional.  It  is  supervised  learning  since  certain  pattern  of  'ip(x) 
related  to  the  unknown  system  is  used  during  training.  Different  training  algorithms  were 
discussed  in  [11],  [17]  and  [2],  Due  to  the  convenience  of  our  problem  formulation,  we  use 
the  backpropagation  algorithm  in  [11]  to  train  the  AWNN.  The  backpropagation  algorithm, 
an  extension  of  LMS  algorithm,  modifies  the  weights  at  each  step  with  nonlocal  error  in- 
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formation.  This  is  an  implied  feedback  which  closes  the  loop  for  adapting  weights  of  the 
AWNN.  The  backpropagation  provides  a  suboptimal  solution  in  the  sense  of  using  a  finite 
number  of  wavelet  blocks  to  approximate  the  infinite  dimensional  system.  The  task  here  is 
to  minimize  the  cost  functional  J  of  Equation  (13)  which  is  rewritten  here  for  convenience. 

Jopt  =  mmE[w].  (59) 

From  the  structure  of  the  AWNN,  we  have 

S  =  '52wj,i'PjAsMs)  (60) 

3,1 


as  the  input  to  the  sigmoidal  function  er(-).  We  update  the  weight  vjjj(k)  at  kth  iteration 
by  a  stochastic  difference  equation 


wj,l(k  +  1)  =  Wj,l{k )  +  qkA-WjAk) 

(61) 

where 

a  m  dEk 

(62) 

with  the  learning 

coefficients  qk  s  satisfying, 

^qk  =  oo 
k 

(63) 

S  Qk  <  oo- 

(64) 

k 


The  condition  (63)  constrains  the  sequence  {<?*.}  to  decrease  slowly,  while  (64)  constrains 
to  decrease  q k  quickly.  The  combined  effect  is  to  guarantee  the  appropriate  learning  rate. 
The  gradient  of  the  cost  functional  with  respect  to  the  weight  Wji  is  expressed  as 


dJ 

dwji 


(65) 


We  refer  the  definition  of  the  square  of  error  at  step  k  in  Equation  (10)  and  use  the  subscript 
k  of  a  variable  to  denote  the  value  of  the  variable  at  the  instant  k.  By  the  chain  rule,  we 
have 


dEk 

dwji 


~(Vk  ~  Zk) 


dzk 

dwjj 


~(Vk  ~  zk ) 


dzk  0Sk 
dSk  d  ujj  j 


~(Vk  ~  zk)o\Sk)il]j,iUk. 


(66) 
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Hence 


Awj,I  =  (iJk  ~  a{Sk))a' (Sk)i>jjuk 

=  (Vk  ~  (67) 

hl  U 

as  the  weights  updating  scheme.  The  general  backpropagation  algorithms  can  be  found 
in  [11],  This  process  starts  by  assigning  yjj,  the  coefficients  from  the  base  operation  'Ey  of 
the  measured  output  y(s )  to  the  wavelet  basis  of  the  AWNN,  to  wj  i(O).  The  trained  neural 
wavelet  network  shall  be  used  to  implement  control  system  design.  The  reconstruction 
form  the  given  wavelet  base  is  the  approximation  of  the  plant  up  to  a  certain  resolution. 
Summerizing  the  above  yields  the  following  algorithm. 

Algorithm  2  AWNN  training  scheme 
Step  1:  Set  i  :=  1, 

Jo  :=  0, 

Input  ’H. 

Set  whi(0)  :=  yjti; 

Step  2:  Wjj(i )  :=  wjyl(i  -  1)  +  q^Awj^i  -  1). 

Step  3:  Compute  Jt . 

Step  4:  If  |  Ji  -  J,-_ i|  >  e, 

i  :=  i  +  1,  go  to  Step  2. 

Step  7:  Stop. 

Neural  networks  are  just  another  way  of  curve  fitting  to  available  data.  They  have 
both  advantages  and  disadvantages.  They  are  conceptually  simple  and  easy  to  use  and  are 
adaptable  to  complicated  problems  or  suitable  to  deal  with  problems  which  do  not  have 
a  modeled  structure  or  are  too  complicated  to  model.  Another  advantage  is  that  neural 
networks  offer  a  distributed,  parallel  processing  ability  thus  provide  integrity  and  possible 
fault  tolerance.  The  function  of  each  neuron  is  usually  a  simple  function  which  is  easy  to 
implement.  The  most  obvious  disadvantage  is  that  neural  networks  do  not  recognize  and 
preserve  the  structures  of  the  systems  they  deal  with  and  there  is  no  systematical  way  to 
determine  the  structures  of  the  networks.  Embedding  dynamical  components  depending  on 
the  problem  context  into  the  networks  will  be  useful  in  overcoming  the  disadvantages.  Our 
attempt  in  designing  an  AWNN  will  be  of  research  potential  in  this  regard. 

The  AWNN  can  be  structured  differently.  For  example,  instead  of  using  only  one  hidden 
layer,  we  can  use  a  multi-layer  neural  network.  One  of  the  structures  is  a  two  layer  format 
with  each  neurons  in  the  hidden  layer  being  responsible  for  a  subspace  of  fixed  scale  while 
the  neuron  in  output  layer  summing  the  results  from  all  the  subspaces.  This  structure  may 
facilitate  the  computation.  Different  computational  structures  are  to  be  compared  for  the 
best  result. 
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5  Conclusions 


We  have  developed  ail  algorithm  for  identification  of  infinite  dimensional  systems  via  a,n 
adaptive  wavelet  neural  network.  We  first  solve  the  problem  of  selecting  the  compactly 
supported  optimal  wavelet  base  function  for  spanning  the  subspaces  in  which  the  unknown 
system  is  approximated  up  to  a  predetermined  resolution.  An  algorithm  is  given  for  con¬ 
structing  the  optimal  basis  'Jr  for  the  network  emulator  based  on  the  measurements  of  the 
output  from  the  unknown  system.  We  then  apply  a  backpropagation  algorithm  to  train  the 
resulting  AWNN  for  system  approximation.  This  is  an  efficient  way  of  approximating  an 
infinite  dimensional  system  up  to  a  certain  resolution  in  a  subspace  of  L2(R)  spanned  by 
the  dilations  and  shifts  of  the  optimal  base  wavelet.  Our  method  combines  the  advantage 
of  multiresolution  property  of  wavelet  decompositions  and  the  convenience  of  the  compu¬ 
tational  structures  of  neural  networks.  The  marriage  of  the  best  from  both  fields  should 
provide  a  powerful  took  kit  for  solving  problems  of  a  much  wider  range.  Our  approach  can 
be  generated  to  N  dimensional  case  with  signals  from  L2(RN).  The  methodology  developed 
in  this  paper  is  expected  to  be  useful  not  only  for  system  identification  and  autoregressive 
modeling  but  also  for  signal  classification,  signal  compression  and  reconstruction  as  well. 
Future  research  is  needed  on  these  aspects. 
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